On the Stratification of Multi-label Data
نویسندگان
چکیده
Stratified sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classification tasks, groups are differentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratified sampling could/should be performed. This paper investigates stratification in the multi-label data context. It considers two stratification methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.
منابع مشابه
MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملA Network Perspective on Stratification of Multi-Label Data
In the recent years, we have witnessed the development of multi-label classification methods which utilize the structure of the label space in a divide and conquer approach to improve classification performance and allow large data sets to be classified efficiently. Yet most of the available data sets have been provided in train/test splits that did not account for maintaining a distribution of...
متن کاملA Survey of Social Factors Influencing Social Consensus(Case Study: Bushehr Civic Families)
The aim of this research is to study social factors influencing on social consensus. Sampling method was multi-process and included cluster and multistage sampling and sample size based on Cochran's Formula was 380 persons too. Data collection tools was questionnaire. In this research, the methods of data analysis were independent T-Test, Spearman Correlation Coefficient, Multivariate Regressio...
متن کاملApplication of pH Indicator Label Based on Beetroot Color for Determination of Milk Freshness
Introduction: Applying of a new indicator in food packaging can be effective to inform consumers about the freshness and quality of the products. Materials and Methods: In the current study, a new milk freshness label was investigated containing beetroot color and multi layers of polystyrene. The label characteristics were investigated by estimating color number, release test, and scanning ele...
متن کامل